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Abstract 

This paper presents some general formulas for random partitions of a finite set 
derived by Kingman's model of random sampling from an interval partition generated 
by subintervals whose lengths are the points of a Poisson point process. These lengths 
can be also interpreted as the jumps of a subordinator, that is an increasing process 
with stationary independent increments. Examples include the two-parameter family 
of Poisson-Dirichlet models derived from the Poisson process of jumps of a stable sub- 
ordinator. Applications are made to the random partition generated by the lengths 
of excursions of a Brownian motion or Brownian bridge conditioned on its local time 
at zero. 
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1 Introduction 

This paper presents some general formulas for random partitions of a finite set de- 
rived by Kingman's model of random sampling from an interval partition generated 
by subintervals whose lengths are the points of a Poisson point process. Instances 
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and variants of this model have found apphcations in the diverse fields of popula- 
tion genetics |17, 19 1, combinatorics Bayesian statistics [|^, ecology |l5|] , 
statistical physics 



11, 12, 13 



, 55|, and computer science [25|. 
Section ^ recalls some general results for partitions obtained by sampling from a 
random discrete distribution. These results are then applied in Section ^ to the 
Poisson-Kingman model. Section ^ discusses three basic operations on Poisson- 
Kingman models: scaling, exponential tilting, and deletion of classes. Section ^ 
then develops formulas for specific examples of Poisson-Kingman models. Section ^ 
recalls the two-parameter family of Poisson-Dirichlet models derived in fsofl from the 
Poisson process of jumps of a stable(a) subordinator for < a < 1. Section |^ reviews 
some results of |41, 46, |5^ relating the two-parameter family to the lengths of 
excursions of a Markov process whose zero set is the range of a stable subordinator 
of index a. Section |8| provides further detail in the case a = ^ which corresponds to 
partitioning a time interval by the lengths of excursions of a Brownian motion. As 
shown in (2|, ^, it is this stable(|) model which governs the asymptotic distribution 
of partitions derived in various ways from random forests, random mappings, and 
the additive coalescent. See also |||, ^ for further developments in terms of Brownian 
paths, and 25 1 for applications to hashing and parking algorithms. This paper 
is a revision of the earlier preprint |42|. See |48| for a broader context and further 
developments. 



2 Preliminaries 



This section recalls some basic ideas from Kingman's theory of exchangeable random 
partitions |3C, 31 1, as further developed in |43]. See |4^, 4S] for more extensive reviews 
of these ideas and their applications. Except where otherwise specified, all random 
variables are assumed to be defined on some background probability space {Q,T,F), 
and E denotes expectation with respect to P. Let N := {1,2,...}, let F denote a 
random probability distribution on the line, and let 11 be a random partition of N 
generated by sampling from F. That is to say, two positive integers i and j are in 
the same block of 11 iff = Xj, where conditionally given F the Xi are indepen- 
dent and identically distributed according to F. Formally, 11 is identified with the 
sequence (n„), where n„ is the restriction of 11 to the finite set N„ := {1, . . . 
The distribution of n„ is such that for each particular partition {Ai, ■ ■ ■ , A^} of 
with i^{Ai) = rij for 1 < i < /c, where rij > 1 and '}2i=i — 



F(Un = {^1, • • • , Ak}) = pirn, • • • , nfc) 



(1) 



for some symmetric function p of sequences of positive integers, called the exchange- 
able partition probability function (EPPF) of 11. Conversely, Kingman |0|, ^ showed 
that if n is an exchangeable random partition of N, meaning that the distribution 
of its restrictions n„ is of the form (||) for every n, for some symmetric function p, 
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then n has the same distribution as if generated by sampling from some random 
probability distribution F. Let Pi denote the size of the ith largest atom of F. If F 
is a random discrete distribution, then - Pi = 1 almost surely, and 11 is said to have 
proper frequencies (Pi). In that case, let Pj denote the size of the jth atom discovered 
in the process of random sampling. Put another way, Pj is the asymptotic frequency 
of the jth class of 11 when the classes are put in order of their least elements. It is 
assumed now for simplicity that Pj > for all i almost surely, and hence Pj > for 
all j almost surely. The sequence {Pj) is a size-biased permutation of {Pi). That 
is to say, Pj = P-,^. where for all finite sequences ^ j ^ k) of distinct positive 

j for all 1 < j < k) given 



integers, the conditional probability of the event (tTj 



(Pi,P2,...) is 



1 Pii 1 Pii ■ ■ ■ Pik~l 



(2) 



The distribution of 11^ is determined by the distribution of the sequence of ranked 
frequencies {Pi) through the distribution of the size-biased permutation {Pj). To be 
precise, the EPPF p in (||) is given by the formula [43| 



p(ni,---,nfc) =E 



fin"-'n|i-E^. 



i=l 



1=1 



(3) 



Alternatively |45] 



(iiv-jfc) «=i 

where {ji, . . . ,jk) ranges over all permutations of k positive integers, and the same 
formula holds with Pj. replaced by Pj.. For each n = 1,2,- •• the EPPF p, when 
restricted to (ni, • • • ,nfc) with Yli''^i ~ ^' determines the distribution of n„. Since 
n„ is the restriction of n„+i to N„, the EPPF is subject to the following sequence of 



addition rules [43]: for k = 1,2 



p{ni,-- • ,nfe) = ..,nj + l,...) +p{ni, . . . ,nfc,l) 



(5) 



where (. . . , rij + 1, . . .) is derived from (ni, . . . , n^,) by substituting Uj + 1 for Uj. The 
first few rules are 

l=p(l) =p(2)+p(l,l) (6) 
p(2) =p(3)+p(2,l); = 2p(2,l)+p(l,l,l) (7) 

where p{2, 1) = p{l, 2) by symmetry of p. Let fi{q) denote the qth. moment of Pi: 



f,{q) :=E[P/ 



p'^ i>{dp). 



(8) 
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where v denotes the distribution of Pi on (0, 1]. Following Engen ||T^, call v the struc- 
tural distribution associated with an random discrete distribution whose size-biased 
permutation is {Pj), or with an exchangeable random partition 11 whose sequence of 
class frequencies is {Pj)- The special case of (^) for = 1 and ni = n is 



p{n) = E[P' 



^i{n-l) (n = l,2,---). 



(9) 



From @, (0), and (P) the following values of the EPPF are also determined by the 
first two moments of the structural distribution: 



p(i,i) = i-Mi); p(2,i) = Mi)-M2); = i -3Mi) + 2/i(2). (lo) 

So the distribution of the random partition of {1,2,3} induced by 11 with class fre- 
quencies {Pi) is determined by the first two moments of the structural distribution of 
Pi. It is not true in general that the EPPF is determined for all (ni, • • • , n^) by the 
structural distribution, because it is possible to construct different distributions for 
a sequence of ranked frequencies which have the same structural distribution. 

Continuing to suppose that (Pj) is the sequence of ranked atoms of a random 
discrete probability distribution, and that (Pj) is a size-biased permutation of (Pj), 
for an arbitrary non- negative measurable function /, there is the well known formula 



E 



E 



EM- 



E 



Pi 



u{dp). 



(11) 



This formula shows that the structural distribution v encodes much information about 
the entire sequence of random frequencies. Taking / in ( pT| ) to be the indicator of 
a subset B of (0,1], the quantity in (pT|) is ^{B) = f^p~^i>{dp). This measure u is 
the mean intensity measure of the point process with a point at each Pj G (0, 1]. For 
X > ^ there can be at most one Pj > x, so the structural distribution u determines 
the distribution of Pi = maxj Pj on (i, 1] via the formula 



P(Pi > x) = u{x, 1] = / p'^9{dp) {x > i). 

J{x,l] 



(12) 



Typically, formulas for P(Pi > x) get progressively more complicated on the intervals 
(|, |], (|, |], • • •. See for instance |^, [50t| . 

A random variable of interest in many applications is the sum of mth powers of 
frequencies 



Sm:=^pr = T.p" 



(m = l,2,...] 



i=l 



where it is still assumed that Si = 1 almost surely. Let vr := {^i, • • • , Afc} be some 
particular partition of Nn with i^{Ai) = rij for 1 < i < A;, and consider the event 
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(n„ > vr), meaning that each block of n„ is some union of blocks of vr. Then it is 
easily shown that 



F(nn > vr) = E 



.1=1 



j=l {Bi,...,B,} 



(13) 



where the second sum is over partitions {-Bi, . . . , Bj} of N^, and ns ■= J2ieB 
particular, for rii = m this gives an expression for the fcth moment of Sm for each 
k = 1,2,...: 



E 



ok 



1 



p{mki, . . . , mkj) 



(14) 



where the second sum is over all sequences of j positive integers {ki, . . . ,kj) with 
ki + ■ ■ ■ + kj = k. Thus the EPPF associated with a random discrete distribution 
directly determines the positive integer moments of the power sums Sm, hence the 
distribution of Sm, for each m. 



3 The Poisson-Kingman Model 

Following McCloskey Q , Kingman |^ , Engen ||l^ , Perman-Pitman-Yor , 
consider the ranked random discrete distribution (Pj) := (Jj/T) derived from an 
inhomogeneous Poisson point process of random lengths Ji > J2 > • • • > by 
normalizing these lengths by their sum T := Yl"^! Ji- So it is assumed that the 
number Nj of Ji that fall in an interval / is a Poisson variable with mean A(/), for 
some Levy measure A on (0, 00), and the counts Nj-,^, ■ ■ ■ , Nj^, are independent for 
every finite collection of disjoint intervals /i, • • • , J^. It is also assumed that 



/ xA{dx) < 00 and A[1,cxd) < 00 
Jo 



to ensure that P(T < 00) = 1. The sequence (Pi) may be regarded as a random 
element of the space of decreasing sequences of positive real numbers with sum 
1. Throughout this section, the following further assumption is made to ensure that 
various conditional probabilities can be defined without quibbling about null sets: 
Regularity assumption. The Levy measure A has a density p{x) such that the 
distribution ofT is absolutely continuous with density 

f{t):= P(T G dt)/dt 



which is strictly positive and continuous on (0,oo). 



6 



J. Pitman 



Note that the regularity assumption imphes the total mass of the Levy measure 
is infinite: 

coo 

p{x)dx = oo. (15) 

The results described below also have weaker forms for a Levy density p{x) just 
subject to (|l^), with appropriate caveats about almost everywhere defined conditional 
probabilities. 

It is well known that / is uniquely determined by p via the Laplace transform 

-\T\ / ~Xx . 



E(e-^' ) = / e''''-'f{x)dx = exp[-V'(A)] (A > 0) (16) 
Jo 

where, according to the Levy-Khintchine formula, 

POO 

V;(A) = / (1 - e-^'')p{x)dx. (17) 



Alternatively, / is the unique solution of the following integral equation, which can 
be derived from ([l^ ) and (p^) by differentiation with respect to A: 

f{t)= f p{v)f{t-vf-dv. (18) 

Let {Pj) be a size-biased permutation of the normalized lengths (P,) := {Ji/T) and 
let (Jj) = (TPj) be the corresponding size-biased permutation of the ranked lengths 



(Jj). Then ( |18| ) admits the following probabilistic interpretation |37, 41 1: 

P( Ji edv,T e dt) = p{v)dvf{t - v)dt'^. (19) 

This can be understood as follows. The left side of (p!g|) is the probability that among 
the Poisson lengths there is some length in dv near u, and the sum of the rest of the 
lengths falls in an interval of length dt near t — v, and finally that the interval of 
length about v is the one picked by length-biased sampling. Formally, (|19|) is justified 
by the description of a Poisson process in terms of its Palm measures pi| . 

The following two Lemmas are read from |4^, Theorem 2.1]. The first Lemma is 
immediate from (^), and the second is obtained by a similar Palm calculation. 

Lemma 1 For each t > the formula 

f{p\t):=ptp{pt)^ {0<p<l; p:=l-p), (20) 

where p is the density of the Levy measure of T and f is the probability density of T, 
defines a function of p which is a probability density on (0, 1). This is the density of 
the structural distribution of Pi := Ji/T given T = t: 

¥{Pi e dp\t) = f{p\t)dp {0<p<l). (21) 
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Lemma 2 gl For j = 0, 1, 2, • • • let 

j oo 

:= T - Jfc = ^ Jfc (22) 

k=l k=j+l 

which is the total length remaining after removal of the first j Poisson lengths Ji, . . . ,Jj 
chosen by length-biased sampling. Then the family of densities ( pO| ) on (0, 1), param- 
eterized by t > 0, provides the conditional density of the random variable 



G 



Tj Pj+i + + • • • 

given Tq, - ■ ■ ,Tj via the formula 

P (Gj+i e dp |To, ■■■,Tj) = f{p\Tj) dp (0 < p < 1). (23) 

Lemma |2| provides an explicit construction of a regular conditional distribution 
for {Pj) given T = t for arbitrary t > 0. This conditional distribution of {Pj) given 
T = t determines corresponding conditional distributions for the ^-''-valued ranked 
sequence (Pi) and for an associated random partition 11 of N. 

Definition 3 The distribution of (Pj) := (Jj/T) on determined by the ranked 
points Ji of a Poisson process with Levy density p will be called the Poisson-Kingman 
distribution with Levy density p, and denoted pk(/>). Denote by pk(/9| t) the regular 
conditional distribution of (Pj) given (T = t) constructed above. For a probability 
distribution 7 on (0, 00), let 

/•oo 

PK(p,7) := / PK(p|t)7(dt) (24) 
Jo 

be the distribution on obtained by mixing the pk(/9 | t) with respect to "y{dt). Call 
pk(p, 7) the Poisson-Kingman distribution with Levy density p and mixing distribu- 
tion 7. 

Note that PK(p 1 1) = pk(/j, 5t), where 5t is a unit mass at t, and that pk(p) = pk(/3, 7) 
for 7(dt) = f{t)dt. A formula for the joint density of (Pi,---,P„) for (Pj) with 
pk(p 1 1) distribution was obtained by Perman |Q in terms of the joint density pi{t, x) 
of T and Ji. This function can be described in terms of p and / as the solution of an 



integral equation |Q , or as a series of repeated integrals [^] . But this formula will 
not be used here. 

For a probability distribution Q onV^, such as Q = PK(p, 7), a random partition 
n of N will be called a Q-partition if 11 is an exchangeable random partition of N 
whose ranked class frequencies are distributed according to Q. Immediately from 
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Definition ^, the structural distribution of a PK(p, 7)-partition IT of N, that is the 
distribution on (0, 1) of the frequency Pi of the class of 11 containing 1, has density 



P(Pi G dp) /dp = f{p\ t)-f{dt) (0 < j5 < 1) 
Jo 



(25) 



where f{p\t) given by (^) is the density of the structural distribution of Pi given 
T = t m the basic Poisson construction. Similarly, the EPPF of 11 is 



p{ni, 



,nk) 



p{ni, ■■■ ,nk\ t)-f{dt) 



(26) 



where p{ni, • • • , 1 1), the EPPF of a PK(p | t)-partition, is determined as follows: 



Theorem 4 The EPPF of a pk(/9 | t)-partition is given by the formula 



p{ni,- 



,nk\t) = t 



k-l 



^Iinu---,nk;tp)fip\t)dp 



where n = Ui, I{n; v) = 1 if k = 1 and ni = n, and for k = 2,3, . . . 

1 



/(m, ■■■ ,nk;v) :-- 



p{v) 



i=l 



VUi]Uj 



dui ■ • • duk-i 



(27) 



(28) 



where Sk is the simplex {(ui, . . . , Uk) Ui > and ui + • • • + = 1}. 



Proof. In view of the formula ( P0[) for f{p\t), the formula (|27| ) is obtained from 
formula (^) in the following Lemma by dividing by f{t)dt, letting p = Xi/t, and 
integrating out with respect to p and to Ui = Xi/{pt) for 1 < i < /c — 1. □ 



A change of variables gives the following variant of formula (27), whose connection 
to the next lemma is a bit more obvious: 



p{ni, - ■ ■ ,nk\t) = I dv^j^jy^v"--^^ ^I{ni, . . . ,nk;v)p{v). 
Jo 



(29) 



Lemma 5 Let n„ be the restriction to N„ of a PK{p) partition H whose class fre- 
quencies (in order of least elements) are Pj = Jj/T, where T = Jj has density 
f, and the lengths Jj are the points of a Poisson process of lengths with intensity p, 
in length-biased random order. Then for each partition {Ai, • • • , A^} of N„ such that 
#{Ai) =niforl<i<k, 

P(n„ = {Ai, • • • , Ak}, Ji e dxi for 1 < i < k,T £ dt) (30) 

k 

= fit - Eti Xi) dt H p{x,)xf dxi. (31) 

i=l 
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Proof. This can be derived by evaluation of the expectation (|3|) for the joint distri- 
bution of Pi , . . . , Pfc given T = t determined by Lemma |2|. Alternatively, there is the 
following more intuitive argument, which can be made rigorous using the characteri- 
zation of Poisson process by its a Palm measures, as in p9| , pl| . Let n be constructed 
as in [^] using random intervals !{ laid down on [0, T] in some arbitrary random 
order, where the lengths Jj := are the ranked points of the Poisson process with 
intensity p{x), and T = Y^- Ji. Let Ui,U2, - ■ ■ be i.i.d. uniform on (0, 1) independent 
of this construction. Let n be the partition of N generated by the random equivalence 
relation n ~ m iff either n = m or TUn and TUm fall in the same interval /, for some 
i. Then by construction, H is a PK(yo) partition. For the event in (^) to occur, 

(i) there must be some Poisson point in dxi for each 1 < i < k, and 

(ii) given (i), the sum of the rest of the Poisson points must fall in an interval of 
length dt near i — Yli=i^i^ 

(iii) given (i) and (ii), for each 1 < i < k and each m € Ai the sample point TUm 
must fall in the interval of length Xi . 

The infinitesimal probability in (^) therefore equals 

(Ylpix,)dx)l fit- Ell x^)dtll[^y' (32) 

\j=l / i=l 

which rearranges as (|3ll). □ 



The formula ( |27D expresses p{ni, • • • , 1 1) as the expectation of a function of Pi 
given T = t, where the function depends on t and ni, • • • ,nfc. Because some values 
of an EPPF can always be expressed as moments of Pi, as in (^) and ([lO|), it seems 
natural to try to express an EPPF similarly whenever possible. This idea serves as 
a guide to simplifying calculations in a number of particular cases treated later. The 
integrations in (pW) and (p8[) are essentially convolutions, which can be expressed or 



evaluated in various ways. Consider for instance the length := T — Ei=i "^j which 
remains after removal of the first k lengths discovered by the sampling process. Then 
the formula of Lemma |5| can be recast as 

P(n„ = {Ai,---, Ak}, Ji G dxi for 1 < i < /c, Tfc G dv) (33) 

k 

= {v + Eti ^^y"■ f{v)dv n Pi^iK' dx, (34) 

i=l 

which yields the following integrated forms of (p^): 

Corollary 6 The EPPF of a PK(p) -partition is given by the formula 

^0 Jo [V + Ei^i XiP 



J. Pitman 



where n := Yl\=i again by 



f 1 \n~k i-oo 

p{ni,---,n,)=^—f— / A^-idAe-^WJIV'n.CA) (36) 
r(n) Jo fj^ 

where i^{X) := J^i^ — e~^^)p{x)dx is the Laplace exponent as in (0), and 

V'm(A) := -TTz;-i>{\) = {-ir-^ / x^e-^^p{x)dx (m = 1, 2, . . .)• (37) 







Proof. Formula ( plj ) yields (^5|) by integration, and ( |3q ) follows after applying the 
formula 6^" = r(n)-i ]^ X^'-^e'^^dX to b = v + Xi. □ 



These integrated forms (^5|) and (^) also hold more generally, with f{v)dv re- 
placed by P(T G dw), and p{x)dx replaced by the corresponding Levy measure on 
(0, oo), assuming only that the Levy measure has infinite total mass. 

Provided E(e^^) < oo for some e > 0, the Laplace exponent ip can be expanded 
in a neighbourood of as 

oo 

V'(A) = - E '^^-^r 

^-^ ml 

m=l 

where the cumulants Km of T are the moments of the Levy measure 

/•oo 

= (-1)"^-Vm(0) = / X^p{x)dx. 

Jo 

Then for each partition {Ai, ■ ■ ■ , A^} of N„ such that ii^{Ai) = Ui for 1 < i < k, 
Lemma |5| yields the formula 

k 

nnn = {Ai,---,Ak},Tedt)=t-^¥{T + ^'y^,j,^n,edt)'[[Kn, (38) 

j=i 

where Ji^m denotes a random length distributed according to the Levy density tilted 
by x"': 

^{Ji,n, G dx) = K~^p{x)x'^' dx 

and T and the Ji^m for 1 < i < k are assumed to be independent. If fni,...,rn,{t) 
denotes the probability density of T + S^^j^Jj^^. , then formula (|^) for the EPPF of 
a pk(p I t)-partition can be rewritten 

p(-i, I t)=%^nK„, (39) 
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and formula (^) for the EPPF of a PK(/9)-partition becomes 

k 



p{ni, • • • , nfc) = E [(T + J,,„J-] ^n.- (40) 



See also James [23| for closely related formulas, with applications to Bayesian non- 



parametric inference. 

4 Operations 

Later discussion of specific examples of Poisson-Kingman partitions will be guided 
by a number of basic operations on Levy densities p and their associated families of 
partitions. 

4.1 Scaling 

By an obvious scaling argument, the PK{p) and PK(p') distributions are identical 
whenever p'{x) = hp{hx) is a rescaling of p for some 5 > 0. The converse is less 
obvious, but true |^9|, Lemma 7.5]. 



4.2 Exponential tilting 

It is elementary that if /) is a Levy density, corresponding to a density / for T, and b 
is a real number such that ^/>(6) defined by (17) is finite, then 

p('')(x) =p(x)e-^^ (41) 

is also a Levy density, and the corresponding density of T is 

/W(t) = /(t)e^(^)-^* (42) 



It is also well known [34, Proposition 2.1.3] that if P'^ denotes the probability distri- 
bution governing the Poisson set up with Levy density p*-^^ then (42) extends to the 
absolute continuity relation 

This relation is equivalent to a combination of ([42| ) and the following identity, which 
can also be verified using the construction of Lemma |^ 

PK(p(^) 1 1) = pk(/9 1 1) for all t > 0. (44) 

Consequently 

PK(p(^),7) = pk(/),7) (45) 
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for every 7. In particular, the distribution on derived from the unconditioned 
Poisson model with Levy density is 

PK(p(^)) = pk(/>,7(*)) (46) 

where 7^^^ is the p(^) distribution of T, that is -f^^\dt) = f^^\t)dt for as in 
(|4^). It can also be shown that if p' and p are two regular Levy densities such that 
pk(p') = PK(yO, 7) for some 7, then p' = p^^^ and 7 = 7'-*^ for some b. 



4.3 Deletion of Classes. 

The following proposition, which generalizes a result of [^], provides motivation for 
study of pk(/9, 7)-partitions for other distributions 7 besides j{dt) = f{t)dt corre- 
sponding to the unconditioned Poisson set up, and j = 5t corresponding to condi- 
tioning on T = t. Given a random partition 11 of N with infinitely many classes, for 
each = 0, 1, • • • let Ilfc be the partition of N derived from 11 by deletion of the first 
k classes, an operation made precise as follows. First let 11'^ be the restriction of 11 
to Hk := N — Gi — ■ ■ ■ — Gk where Gi, • • • Gk are the first k classes of 11 in order of 
least elements, then derive 11^ on N from 11'^ on Hk by renumbering the points of 
in increasing order. 

Proposition 7 Let H be a PK{p,'j) -partition ofN, and let 11^ be derived from H be 
deletion of its first k classes. Then 11^ is a FK{p,ji.) -partition ofN, where 7^ = jQ^ 
for Q the Markov transition operator on (0, 00) 

Q{t, dv) = p{t -v){t- v)t'^f {v)l{0 <v < t)dv. 

In particular, if H is a PK(p) partition ofN, then is PK(p,^k) -partition 0/ N, 
where 7^ is the distribution ofT^, the total sum of Poisson lengths T minus the sum 
of the first k lengths discovered by a process of length-biased sampling, as in (|2^ 



Proof. According to a result of |41] which is implicit in Lemma ^, the sequence (T^) 
is Markov with stationary transition probabilities given by Q. The conclusion follows 
from this observation, the construction of PK(p;7), and the general construction of 
an exchangeable partition of N conditionally given its class frequencies |43|. 



5 Examples 

5.1 The one-parameter Poisson-Dirichlet distribution. 

Following Kingman [^], for the particular choice 

p{x) = e x-^ e-^"" (47) 
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where ^ > and 6 > 0, corresponding to T with the gamma(0, b) density 

fit) = ^^t'~'e~'\ (48) 

the pk(/9) distribution is the Poisson-Dirichlet distribution with parameter 9, abbre- 
viated pd{9). Note the lack of dependence on the inverse scale parameter b. The well 
known fact the structural distribution of pd{9) is beta(l,^) follows immediately from 
(pO|). It follows easily from any one of the previous general formulas (p7|), (35), ( [3^ ) 



or (pO|), that the EPPF of a PD(^)-partition 11 = (n„) is given by the formula 

p,(ni,---,nfc) = -^5^J|(n,-l)! (^ = ^n,). (49) 
'-y'^^^)i=i i=i 

This is a known equivalent |32, 43 1 of the Ewens sampling formula |18, ^] for the 
joint distribution of the number of blocks of n„ of various sizes. It is also known 
||4l| , ^ that the following conditions on p are equivalent: 

(i) p is of the form (|47D, for some b > 0,9 > 0; 

(ii) pk{p 1 1) =pk{p) for all t > 0; 

(iii) pk{p) =pd{9) for some 9 > 0. 

(iv) a PK(yo)-partition has EPPF of the form (^) for some 9 > 0. 
See also ^] for further properties and applications of pd{9). 

5.2 Generalized gamma 

After the one-parameter Poisson-Dirichlet family, the next simplest Levy density p 
to consider is 

Pa,c,b{^) = cx-''-'e-'^ (50) 
for positive constants c and b, and a which is restricted to < a < 1 by the con- 



straints on a Levy density and (15). The corresponding distributions of T are known 
as generalized gamma distributions Note that the usual family of gamma distri- 
butions is recovered for a = 0, and that a stable distribution with index a is obtained 
for 6 = and < a < 1. One can also take a = —k for arbitrary k > 0, except that 
in this model the Levy measure has a total mass iJj{oo) < oo so 

P(r = 0) = exp(-V'(oo)) > 0, 

contrary to the present assumption that the distribution of T has a density. Such 
models can be analyzed by first conditioning on the Poisson total number of lengths, 
which reduces the model to one with say m i.i.d. lengths with probability density 
proportional to p. In the case ( [50| ) for a = —k, that is to say that the lengths are 
i.i.d. gamma(K, b) variables. This model for random partitions has been extensively 
studied. It is well known that features of the pd{9) model can be derived by taking 
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limits of this more elementary model with m i.i.d. gamma(K, h) lengths as k ^ and 
m ^ oo with Km 6. See for a review of this circle of ideas and its applications 
to species sampling models. 

The PK(/3Q,^c^fe) model for a random partition defined by Pa,c,b in (10) ^ — 
a < 1 was proposed by McCloskey ||3^, who first exploited the key idea of size- 
biased sampling in the setting of species sampling problems. Due to the remarks 
in Section ^ about scaling and exponential tilting, for < a < 1 the family of 
PK(/Oa,c,fe, 7) distributions, as 7 varies over all distributions on (0,oo), depends only 
on a and not on c or b. So in studying this family of distributions on and their 
associated exchangeable partitions of N, the choice of c and h is entirely a matter of 
convenience. This study is taken up in the next section, with the choice of 6 = and 
c = a/V{l — a) which leads to the simplest form of most results. See also 24, ^] 
regarding generalized gamma random measures and further developments. 

5.3 The stable [a] model 

Suppose now that governs the Poisson model for T with stable (a) distribution 
with Laplace transform 

E„[exp(-Ar)] = / e~^^/„(2;)(ix = exp(-A") (51) 



for some < a < 1, where fa{x) is the stable(a) density of T, that is |52] 



fe=0 



For a = I this reduces to the following formula of Doetsch |14, pp. 401-402] and 
Levy H: 

1 3 1 

Pi(2T G dx)/dx = ^/i(|x) = — =x-2e-2^. (53) 



Special results for a = ^, discussed in Section |8|, involve cancellations due to simplifi- 
cation of fa{pt)/fa{t) for < p < 1, which does not appear to be possible for general 
a. The Levy density corresponding to the Laplace transform ( |5l| ) is well known to 
be 

~a-l 

Pa{x) = ^ (x > 0). (54) 

1 (1 — aj 

Write ¥a{ ■ \ t) for P„( • | T = t). So the P^ distribution of (Pi) on is PK(p„), and 
the Pa( • 1 1) distribution of (Pj) is PK{pa \ t). Note from (^ that if Tc is the total 
length in the model governed by cpa for a constant c > 0, then Tc has the same 
distribution as c^/"Ti for Ti = T as in (l5l|). Together with similar scaling properties 
of the lengths Jj , this implies that for all < a < 1 and i > there is the formula 

PK{cpa I t) = PK{pa \ C~^/" t). (55) 
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Formulas for the PK(pa 1 1) distribution are described in Section 5.4. These formulas 
can be understood as disintegrations of simpler formulas obtained in [43|, and recalled 
in Section |6|, for a particular subfamily of the class of pk(pq,,7) distributions. 

One reason for special interest in the Kingman family associated with the stable 
Levy densities pa is the following result which will be proved elsewhere. 



Theorem 8 The EPPF of an exchangeable random partition 11 o/N with an infinite 
number of classes with proper frequencies has an EPPF of the Gibbs form 

k 

p{ni, • • • , nfc) = Cn,k Wn^ where n = Yli=i (56) 

i=l 

for some positive weights wi = 1,W2,W3,... and some Cn,k if cmd only if 

m—l 

Wm=W{j-oi) (m = l,2,...) 

J=l 

for some 0<a<l. If a = then the distribution ofH corresponds to PD{9)j{d6) 
for some probability distribution 7 on (0,oo), whereas i/0 < a < 1 then the distribu- 
tion o/n corresponds to PK{pa,^) ■= pk(/9q | t)^{dt) for some 7. 

See also Kerov and Zabell [^] for related characterizations of the two-parameter 
family discussed in Section |6[ This family is characterized by an EPPF of the form 
( |5^ ) with c„jfc a product of a function of n and a function of k. 



5.4 Conditioning on T 

Assume throughout this section that < a < 1. Immediately from ( [20| ) and (^), in 
the pk(po 1 1) model, the distribution of Pi has density 

r(l - a) fa{t) 

Let /i be a non-negative measurable function with Eq/i(T) = h{t)fa(t)dt = 1, 
and let h ■ fa denote the distribution on (0, 00) with density h{t)fa{t). Then by 
integration from (p7|), under the probability ^a,h governing the FK{pa, h - fa) model, 
the structural distribution of Pi has density 

IP«,fe(Pi £ dp) /dp = ^ , - pr-^ria,h{l - P) (0<p<l) (58) 

r(i — a) 

where 

poo 

Va,hiu):= v-''h{v/u)fa{v)dv = Ea[T~''h{T/u)]. (59) 
Jo 
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For instance, it is known [BTI that 



C„,e:=E„(r-^) = ^^|^ {e>-a). (60) 

So for 9 > —a, ( |58| ) and (^) imply: 

if = C^Qt~^ then Pi has beta(l — a, a + 0) distribution. (61) 

This example is discussed further in the next section. As another example, if h{t) = 
exp(6" — 6t) for some b > 0, then according to ( ^6| ) the model FK{pa, h- fa) is identical 
to the unconditioned generalized gamma model pk(pq ;,) with 



So the structural density of the PK{pa^b) model is given by formula (|5^ ) with 

r/,,fc(n) = exp(6'^)E,[r-"exp(-6T/n)]. (62) 
For a = ^ the expectation in ( |6^ ) can be evaluated by using (^) to write for ^ > 

Ei[r-lexp(-er)] = ^ r^e-«-+i/-')/2 = 2./iKi(ye) (63) 

where Ki is the usual modified Bessel function. Thus for 6 > the pk(/9i ^) model 
associated with the inverse Gaussian distribution has structural distribution with 
density fi i given by the formula 



-bx 



2 



Proposition 9 For < a < l,g > Zet PcxiQ]^) denote the qth moment of the 
structural density (57) of the pk(pq, | t) distribution: 

fia{q\t):= [ p''fa{p\t)dp = Ea{P?\t). (65) 
Jo 

Then for each t > the EPPF of a pk(/9q, | t) partition of N is 

Til -a) /a\''-^ -A- 

Pa(ni, • • • ,nfc I i) = — — fia{n-l-ka + a\t) \\[l-a\n,-i (66) 

Fin — ka) vt"/ 
^ ' 1=1 

where 

j=l y ' 
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Alternatively, 

h k 

Pa{ni, ■■■ ,nk\t) = —ga{n - ka\t) _[_[[! - a\n,-i (67) 

i=l 

where g^iq \ t) := {Tiq)U{t))'^ f^t - v)v'i-Hv. 

Proof. This is read from Theorem ^, since the integral ( [2^ ) reduces to a standard 
Dirichlet integral. □ 



As checks on (p6|), the symmetry in (ni, . . . ,nfc) is still evident, and PQ,(n| t) = 
/^^(n — 1 1 1) as required by (^). However, the addition rules (|5|) for this EPPF are 
not at all obvious. Rather, they amount to the following identity involving moments 
of the structural distribution: 

Corollary 10 The moments //a((?|t) of the structural distribution on (0, 1) associated 
with the pk(pq | t) distribution on satisfy the following identity: for all 1 < k < n 
and t > 

rf 72 — kcMt CX t ^ 

^a{n — 1 — ka + a\t) = //^(n — /ca + a 1 1) H ; /iQ,(n — ka\t). (68) 

r(n + 1 — ka — a) 

To illustrate, according to the simplest addition rule (|6|), 

1 =p«(2|t) +p„(l,l|t). 



which amounts to (68) for n = A; = 1, that is 



, , , r(l — a) a , , , , , 

l = /^«(l|i)+ ^;(^3^^/^a(l-«|i)- (69) 



The addition rule underlying ( pq ) can be checked for general a by an argument 
described in Section In the case a = ^, the later formulae (|^) and (|93|) show that 
( |6^ ) reduces to a known recursion (|106| ) for the Hermite function. 



Repeated application of (68) shows that for each \ < k < n the moment on the 
left side of (|66|) can be expressed as a linear combination of integer moments /i^ ( j 1 1) 
for J = 0, • • • , n — 1, with coefficients depending on n, /c, a, t which could easily be 
computed recursively. But except in the special case a = \ discussed in Section ^, 
even the integer moments seem difficult to evaluate. 

6 The two-parameter Poisson-Dirichlet family 

For < a < 1, > —a, let 7^,0 denote the distribution on (0, cxd) with density C^^t^^ 
at t relative to the stable(a) distribution of T defined by (^), that is 

laAdt)=C-lt~' fa{t)dt (70) 
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where C^^ := W.^{T-') = + l)/T{d + 1) as in (|0D and (|6l|). 

Definition 11 50 1 The Poisson-Dirichlet distribution with two parameters {a, 0), 
denoted pd(q!, 9), is the distribution on defined for < a < 1, ^ > —a by 

PD(a,0) = i ) ' N r n 1/1 (71) 

\ pk(p„,7„,0) for < a < 1,6* > -a 
This family of distributions on has some remarkable properties and applications. 



As shown in |41], it follows from Lemma y that if (Pj) has PD(a, 0)distribution then 



the corresponding size-biased sequence {Pj) can be represented as 

Pj = W,^{l-W,) (72) 

i=l 

where the Wj are independent with beta(l — a,0 + ja) distributions. (73) 

So the PD{a,6) distribution can just as well be defined, without reference to the 
Poisson-Kingman construction, as the distribution of (Pj) defined by ranking (Pj) 
constructed by (^) from independent Wj as in (|7^). The sequence (Pj) defined 
by (|7^) and ( [73| ) for < a < 1 and > was considered by Engen [|l5| as a 



model for species abundances. See [50| for further study of the PD(a,^) family. It 
was shown in [44| that if (Pj) is a random element of with Pj > a.s. for all 
i and the corresponding size-biased sequence (Pj) admits the representation (|7^ ) 
with independent residual fractions Wj, then the Wj must have beta distributions 
as described in (^), and hence the distribution of (Pj) must be PD(a, ^) for some 
< a < 1 and > —a. Reformulated in terms of random partitions, and combined 
with Proposition 0, this yields the following: 

Proposition 12 Let U be the exchangable random partition ofN derived by sampling 
from a random element (Pj) of with Pj > for all i. Let IL^ be derived from H 
by deletion of the first k classes of H, with classes in order of appearance, as defined 
above Proposition 0. Then the following are equivalent 

(i) for each k, is independent of the frequencies (Pi, • • • , Pfc) of the first k classes 
ofU; 

(ii) n is a PT){a,0) -partition for some < a < 1 and > —a, in which case lik is a 
PD(a, 9 + ka) -partition. 

As shown in |43], the independence property ( [72| ) of the residual fractions Wj of 
a pd(q;, 0)-partition allows the corresponding EPPF Pa,e{ni, . . . , Uk) to be evaluated 
using (^). The result is as follows. For all < a < 1 and > —a, 

Po,fi{ni,...,nk) = I Ml-aJ„^„i (74) 

[0 + ijn-l t~{ 
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where n = Yl^=i for real x and a and non-negative integer m 

1 for m = 

x{x + a) ■ ■ ■ {x + {m — l)a) for m = 1, 2, . . . 



and [x]m = [x]m;i- The previous formula (^9|) is the special case of ( [T^ ) for a = 0. 
Both this case of ([74|), and the case when < a < 1 and = 0, follow easily from (]36|). 
Formula ( [74|) shows that a PD(a,0) partition 11 of N to be constructed sequentially 
as follows [p3|, Starting from Hi = {{!}}, given that n„ has been constructed as 
a partition of N„ with say k blocks of sizes (ni, • • • , n^), define n„+i by assigning the 
new element n + 1 to the jth class whose current size is rij with probability 

P(jT |ni,---,nfc) = ^^^ (75) 

for 1 < j < k, and assigning n + 1 to a new class numbered k + 1 with the remaining 
probability 

Jcry 

F{k + 1] \n,,...,n,) = —- (76) 

n + 

For a = and 9 > this is generalization of Polya's urn scheme developed by 



Blackwell-McQueen [0] and Hoppe |^. See for consideration of more 



general prediction rules for exchangeable random partitions. 



The following calculation shows how to derive either of the two EPPF's (74) and 



(GC) from the other. The argument also shows that the function pci(ni, • • • , 1 1) 
defined by ( |66[ ) satisfies the addition rules of an EPPF as a consequence of the cor- 
responding addition rules for Pa,ein'i, ■ ■ ■ , n^), which are much more obvious. 

The kernel ^afi{dt) introduced in (|70[), is now viewed for a fixed a as a family of 
probability distributions on (0, oo) indexed by E (— a,oo), that is a Markov kernel 
7q from (— a,oo) to (0,oo). For a non- negative measurable function h = h{t) with 
domain (0, oo), define a function 7^/1 = (7q,/i)(0) with domain (— q,oo) by the usual 
action of this Markov kernel as an integral operator: 

/•oo 

{nahm= / -ia,e{dt)h{t) (77) 
Jo 

Then say {'-fah){0) is the ^a-tfansform of h{t). Let K^^e denote expectation with 
respect to the probability distribution 

Jo 

By definition, for each non-negative random variable X governed by the family of 
conditional laws (Pq(- \ t),t > 0), 

the 7c,-transform of Ea{X \ t) is Ea,e{X). (78) 
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In particular, for each (m, . . . , nfc), 

the 7Q-transform of Pa(ni, • • • ,nfc 1 1) is Pafiini, ■ ■ ■ ,nk)- (79) 

An obvious change of variable allows uniqueness and inversion results for the ja- 
transform to be deduced from standard results for Mellin or bilateral exponential 
transforms. So the problem is just to show that the 7Q,-transform of the right side of 
(|6^ ) is the right side of (l?^). To see this, observe first that for each q > 0, because 

the 7„-transform of fia{q\t) is Ea,e(Pf ) = " '^^^'^^ (80) 

where Ka^e{Pi) is evaluated using (|l|). To deal with the factor of r^^"^)" in (H), 
note from (^) that for each /3 > 0, and any h(t), 

Y(— + — + + 

the 7„-transform of t-'^h(t) is — ^ ^ — hah)(e + P). (81) 

' ^ ' r(f + i)r(e + /3 + i) ^ ' ^ ' 

By (|80| ) for q = n — \ — ka + a and (|8l| ) for (3 = ak — a and h{t) = fia{(l \ t) the right 
side of (|66|) has for its 7Q-transform the following function of 9: 



a^-^r(l - a) r(| + A:)r(g + 1) r(n - fca)r(l + e + ka-a) -A- 

r(n - to) r(f + i)r(0 + to - a + 1) r(n + e)r(i-a) ij^ ""^"'"^ 

which reduces by cancellation to the right side of (^). 

6.1 The a-diversity 

Let n be an exchangeable random partition of N with ranked frequencies (Pi). Let 
Kn denote the number of classes of n„, the partition of Nn induced by H. Say that 
n has a-diversity S and write a-DiVERSiTY(n) = S" iff there exists a random variable 
S with < S" < oo a.s. and 

Kn ~ Sn'^ as n ^ oo (82) 

where for two sequences of random variables An and Bn, the notation An ~ Bn will 
now be used to indicate that An/Bn — > 1 almost surely as n — > oo. According to a 



result of Karlin |27], applied conditionally given {Pi), if these ranked frequencies are 
such that 

r(T^)" <«^' 

for some < 5 < oo then H has a-diversity S. 
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Proposition 13 Suppose 11 is a pk(pq,7) partition of N for some < a < 1 and 
some probability distribution 7 on {0,oo). Then 

(i) a-DiVERSiTY(n) = 5 for a random variable S with S = T"" where T = 3'^/°" 
has distribution 7. In particular, S = t~°' is constant if 11 is a pk(/>q | t) partition. 

(ii) A regular conditional distribution for 11 given S = s is defined by the EPPF 
Paini, • • • , nfc|s~^/") obtained by setting t = s~^/" in (|66|). 

(iii) In particular, both (i) and (ii) hold if IT is a PD{a,6) partition for some 9 > —a. 
Then the a-diversity S* of II is S" = T~" for T with the distribution 'j^ g defined by 



Proof. Suppose that (Pj) has pk(/>q,7) distribution. The fact that (|83|) holds for 
S = T~" in the unconditioned case where T has stable(Q) distribution is due to 
Kingman |2^]. Kingman's argument, using the law of large numbers for small jumps 
of the Poisson process, applies just as well for T conditioned to be a constant t. So 
(82) follows in general by mixing over t. □ 



See [^] and papers cited there for further information about the Mittag-Leffler 
distribution of S" = T~" derived from a pd(q, 0) partition. The corresponding distri- 
bution of S for PD(a, 0) has density at s proportional to s^/" relative to this Mittag- 
Leffler distribution. 

As shown in [ |50| , Proposition 10], if II is a partition of N whose ranked frequencies 
(Pi) have the PD(a,0) distribution, then S = a-DiVERSiTY(n) can be recovered from 
n or (Pi) via either (^Tj) or (|8^). Then T = S~^^" has stable(a) distribution as in 
(|5ll), and (TPi) is then sequence of points of a Poisson process with Levy density 
Pa- See also [^, ^ for more about the distribution of Kn derived from a PD{a,9) 
partition. 



7 Application to lengths of excursions. 



This section reviews some results of ||4l| , 46, 50 1. Let ¥^ govern a strong Markov 
process B starting at a recurrent point of its statespace, such that the inverse 
(t£,^ > 0) of the local time process {Lt,t > 0) of B at zero is a stable subordinator 
of index a for some < a < 1. That is to say, E[^exp(— Ari) = exp(— cA") for some 
constant c > 0. So the Pj^ distribution of ti is the Pq distribution of c^/"T for T as in 
(|5l|). For example, B could be a one-dimensional Brownian motion (a = ^) or Bessel 
process of dimension 2 — 2a. In the Brownian case, take c = y/2 to obtain the usual 
normalization of local time as occupation density relative to Lebesgue measure, which 

makes Li = |-Bi|. Let M = {t : < t < l,Bt = 0} denote the random closed subset 
of [0, 1] defined by the zero set of B. Component intervals of the complement of M 
relative to [0, 1] are called excursion intervals. For < t < 1 let Gt = sup{Mn [0, f]}, 
the last zero of B before time t. Note that with probability one, Gi < 1, so one of 
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the excursion intervals is the meander interval (Gi,l], whose length 1 — Gi is one 
of the lengths appearing in the list [Pi) say of ranked lengths of excursion intervals. 
According to the main result of E^, 



the sequence {Pi) of ranked lengths has PD(a, 0) distribution (84) 

Let Ui,U2,--- be a sequence of i.i.d. uniform [0,1] random variables, independent 
of B, called the sequence of sample points. Let 11 = (n„) be the random partition 
of N generated by the random equivalence relation i ~ j iff Gjj- = Guy That 
is to say i ~ j iff Ui and Uj fall in the same excursion interval. So for example 
Lis = {{1, 2, 5}, {3}, {4}} iff Ui, U2 and U5 fall in one excursion interval, C/3 in another. 



and U4 in a third. By translation of results of 5C] into present notation 



n is a pd(q,0) partition and a-DiVERSiTY(n) = cLi (85) 

where Li is the local time of B at zero up to time 1. By construction, the sequence 
(Pj) of class frequencies of 11 is the sequence of lengths of excursion intervals in the 
order they are discovered by the sample points, and (Pi) is recovered from (Pj) by 



ranking. To illustrate formula (74), Ui and U2 fall in different excursion intervals 
with probability pa,o{^^ 1) = Oi, and in the same one with probability pafl{2) = 1 — a. 
Similarly, given that the local time is Li = i, two sample points fall in the same 
excursion interval with probability Pq(2 | (c£)^^/"), and in different excursion intervals 
with probability Pq,(1, 1 1 {c£)~^/"), for pa{- ■ ■ \ t) defined by (p6|). See Section E for 



evaluation of these functions in the case a = corresponding to a Brownian motion 



2 

B. 



Let Rn = 1 — Pi — ■ ■ ■ — Pn, which is the total length of excursions which remain 
undiscovered after the sampling process has found n distinct excursion intervals. The 



result of Proposition 12 in this setting, due to |41|, is that for each n = 0, 1, 2, 



PD(a, na) distributed sequence is obtained by ranking the sequence 

-^(P„+i,P„+2,---) (86) 

of relative excursion lengths which remain after discovery of the first n intervals. For 
n = 1 the same PD(a, a) distribution is obtained more simply by deleting the meander 



of length 1 — Gi, renormalizing and reranking. This is due to the result of [49| that the 
length 1—Gi of the meander interval is a size-biased choice from (Pi). As the excursion 
lengths in this case are just the excursion lengths of a standard bridge, equivalent to 
conditioning on Bi = 0, the ranked excursion lengths of such a bridge have PD(a, a) 
distribution. As first shown in P9| , this implies that both the unconditioned process 
B and the bridge B given Bi = share a common conditional distribution for the 
ranked excursion lengths (Pi) given the local time Li. In present notation, this 
conditional distribution of (Pj) given Li = i, with or without conditioning on Bi = 0, 

is PK(p,|(c£)-V"). 
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One final identity is worth noting. As a consequence of tlie above discussion, for 
the process B, the conditional distribution of the meander length 1 — Gi given Li = £ 
is given by 

P0(1 -Gi€ dp\Li =i)= P0(A E dp\Li =£) = fa{p\{ci)-^hdp (87) 

where faip\t) as in ( [57| ) is the structural density of the Poisson model for stable (a) 
distributed T conditioned on T = t. So the moment function fiaiQ \ t) appearing in 
the EPPF ( |6^ ) of this model can be interpreted in the present setting as 

fiaiq 1 1) = E° [(1 - GiY I Li = c-H-'']. (88) 



8 The Brownian excursion partition 

In this section let 11 be the Brownian excursion partition, that is the random par- 
tition of N generated by uniform random sampling of points from the interval [0, 1] 
partitioned by the excursion intervals of a standard Brownian motion B. According 



to the result of |49] recalled in (^ 



n is a pk(/3i) = PD(i,0) partition. (89) 

With conditioning on Bi = 0, the process B becomes a standard Brownian bridge. 
So n given Si = is a pd(^, ^) partition, as discussed in the previous subsection. 
Features of the distribution of 11 and the conditional distribution of 11 given Bi = 



were described in [46|. This section presents refinements of these results obtained by 
conditioning on Li, the local time of i? at up to time 1, with the usual normal- 
ization of Brownian local time as occupation density relative to Lebesgue measure. 
Unconditionally, Li has the same distribution as |-Bi|, that is 

P(Li G dX) = F{\Bi\ G dX) = 2ip{X)dX (A > 0) 

where ip{z) := (l/\/27r) exp(— ^z^) is the standard Gaussian density of Bi. Whereas 
the conditional distribution of Li given i?i = is the Rayleigh distribution 



¥{LiedX\Bi = 0) = V2iTXip{X)dX (A > 0). 

Note from ( |85| ) that the ^-diversity of 11 is the random variable \plL\. So the number 
Kn of blocks of n grows almost surely like \/2nLi as n ^ oo. For A > let n(A) 
denote a random partition with 

n(A) = (n|Li = A) = (n|Li = A,Bi = 0) (90) 

where = denotes equality in distribution. So according to the previous discussion, 

n(A) is a pk(/9i I iA^^) partition (91) 



24 



J. Pitman 



whose ^-diversity is y/2\. Let pd(^ || A) denote the probabihty distribution on 
associated with n(A), that is the common distribution of ranked lengths of excursions 
of a Brownian motion or Brownian bridge over [0, 1] given Li = A. Then by Definition 
11 and (|5^) , for > — ^ there is the identity of probabihty laws on 



PD(i^) = T.no 12.1 / PD(i||A)A2V(A)dA (92) 



n\Bi 

where, according to the gamma(^) distribution of and the duplication formula 
for the gamma function, 

n\Br) = 2^^^ = 2~'^-^±^ {e>-\). (93) 

It was shown in (see also ^) that it is possible to construct the Brownian 
excursion partitions as a partition valued fragmentation process (n(A), A > 0), mean- 
ing that n(A) is constructed for each A on the same probability space, in such a way 
that n(A) is a coarser partition than n(/i) whenever A < ^. The question of whether 
a similar construction is possible for index a instead of index ^ remains open. A 
natural guess is that such a construction might be made with one of the self-similar 
fragmentation processes of Bertoin but Miermont and Schweinsberg |38| have 
recently shown that a construction of this form is possible only for a = ^. 

8.1 Length biased sampling 

Let Pj{X) denote the frequency of the jth class of n(A). So {Pj{X),j = 1,2...) is 
distributed like the lengths of excursions of B over [0, 1] given Li = A, as discovered by 
a process of length-biased sampling. In view of Levy's formula (|5^ ) for the stable(^) 
density, the formula (57) reduces for a = ^ to the following more explicit formula for 



the structural density of n(A): 



Pi(A) £ dp) = ^^p~t(l -p)-^exp ( ^ ^ ]dp {0<p<l) (94) 



or equivalently 



(Pi<y) = 2cI>(Aj^) -1 (0<y<l) (95) 



where $(-z) := P(-Bi < z) is the standard Gaussian distribution function. Put another 
way, there is the equality in distribution 

MM ^ (%) 
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Furthermore, by a similar analysis using Lemma 0, there is the following result which 
shows how to construct the whole sequence {Pj{X),j > 1) for any A > from a single 
sequence of independent standard Gaussian variables. Then n(A) can be constructed 
by sampling from {Pj{\),j > 1) as discussed in Section |2|. 

Proposition 14 |3|, Corollary 5] Fix A > 0. ^ sequence {Pj{X),j > 1) is distributed 
like a length-biased random permutation of the lengths of excursions of a Brownian 
motion or standard Brownian bridge over [0, 1] conditioned on Li = X if and only if 



A^ 



A2 + Sj.i A2 + Sj 



(97) 



where Sj := "^l^i Xi for Xi which are independent and identically distributed like Bf 
for a standard Gaussian variable Bi. 

Let fi{q II A) denote the qth. moment of the distribution of Pi (A). So in the notation 
of (11) and dD 

Mg||A):=E[(A(A))«] = m((?|iA-2). (98) 



Lemma 15 For each A > 
fiiq\\X)=E 



A2 + Bl 



E(|i?i|25)/i_2,(A) {q> 



(99) 



where E(|i?i|29) is given by ( |9^ ) and h^2q is the Hermite function of index —2q, that 
is ho{X) = 1 and for g ^ {0, 1, 2 . . .} 



h-2,iX) :-- 



2T{2q) 



i-xy 



j=0 



Also, 



n{q\\X) = E[exp(-Ay2r^ 



iq>0) 



(100) 
(101) 



where Tq denotes a Gamma random variable with parameter q: 
¥{Tq G dt) = T{q)-^t'i-^e-Ut {t > 0). 



Proof. The first equality in ( |99| ) is read from (|96D. The second equality in (|99D is 
the integral representation of the Hermite function provided by Lebedev |3^, Problem 
10.8.1], and ( 100 ) is read from |]35| , (10.4.3)]. According to another well known integral 
representation of the Hermite function (10.5.2)], |l^, 8.3 (3)], for q > 



h. 



2q 



[x) 



1 



r(2g) 70 



dt 



21 



-1 



r{2q) Jo 



q-l -v-xy/2v 



dv. 



Formula (101) follows easily from this and (|99| 



(102) 
□ 
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The identity 



E 



A2 + Bl 



E[exp(-A72r;] {q > 0), 



(103) 



which is imphed by the previous proposition, can also be checked by the following 
argument suggested by Marc Yor. Let X be a positive random variable independent 

of Tg, and let e with e = Fi be a standard exponential variable independent of both 
X and Tq. Then by elementary conditioning arguments, for ^ > 



E 



\( Vl 


= E 


e-er,/x" 


V\o + x) \ 







{eX/Tg > 



(104) 



Take X 



Bl and 



A^, and use the identity eBl = 6^/2, which is a well known 



probabilistic expression of the gamma duplication formula, to deduce (|103D from 

The following display identifies hy{z) in the notation of various authors: 

= 2-~''I^Hy{z/^/2) = 2''/'^^{-u/2,l/2,z'^/2) (Lebedev||) 
= 2''/2f/(_z,/2,l/2,zV2) (Abramowitz and Stegun) g 



64^ 



U{-v 
Du{z) 



2' • 



(Miller mi) 

(Erdelyi |l|], Toscano ||) 



The functions U{a,z) and Dy{z) are known as parabolic cylinder functions, Weber 
functions or Whittaker functions. The function U{a, b, z), which is available in Math- 
ematica as HypergeometricU[a,b,z] , is a confluent hypergeometric function of the 
second kind. Note that hn{z) defined for n = 0, 1, 2, . . . by continuous extension of 
dlOOD is the sequence of Hermite polynomials orthogonal with respect to the standard 
Gaussian density V9(x). Also, the function h-i{x) for real x is identified as Mill's 
ratio H, 33.7]: 

\Bi > x) 



h-i{x) 



ip{x) 



62^ 



'2^^dz. 



For all complex v and z, the Hermite function satisfies the recursion 

K+i{z) = zhy{z) - vhu_i{z), 
which combined with ( 105 ) and hQ{x) = 1 yields 

h-2{x) = 1 — xh^i{x) 
2\h^^{x) = -X + (1 + x^)h^i{x) 



(105) 



(106) 



(107) 
(108) 

3!/i„4(a^) = 2 + - (3x + x^)h^i{x) (109) 
for further interpretations of the Hermite function in terms of 



and so on. See \ 
Brownian motion and related stochastic processes 
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8.2 Partition probabilities 

Recall the notation 



Corollary 16 T/ie distribution o/n(A), a Brownian excursion partition conditioned 
on Li = X, is determined by the following EPPF: for ni, . . . , with X^iLi ni = n 

k 

pi(ni,...,nfc||A) =2"-^A^-i/ifc+i_2n(A) Hllk-i- (HO) 

i=l 



Proof. This is read from (H), (H) and (H). □ 



Formula (110) combined with ( |l^ ) gives an expression in terms of the Hermite 



function for the positive integer moments of the sum S'm(A) of mth powers of lengths 
of excursions of Brownian motion on [0, 1] given Li = A. This formula for m = 2 
was derived in another way by Janson [ p5| , Theorem 7.4]. There the distribution of 
5*2 (A) appears as the asymptotic distribution, in a suitable limit regime, of the cost 
of linear probing hashing. 

According to (|9l|) and Definition [l|, for each 9 > the EPPF (|11C| ) describes 
the conditional distribution of a pd(^,^) partition (n„) given lim„i^„/\/2n = A, 



where Kn is the number of blocks of n„. Easily from (110), for each fixed A > 0, a 



sequential description of (n„(A), n = 1,2,...) is obtained by replacing the prediction 



rules (WE) and (76) by 



T |ni,...,n,) = (2n,-l)^^^^i^^ (1 < J < A:) (111) 
P(fc + 1T |ni,...,n,) = ^^^^^^. (112) 

nk+l-2n[-^) 



The addition rule for the EPPF ( |110| ) is equivalent to the fact that these transition 
probabilities sum to 1. As a check, this is implied the recurrence formula ( |106| ) for 
the Hermite function. 

Corollary 17 Let Kn{X) be the number of blocks of UniX), iwftere (n„(A), n = 1,2,...) 
is the Brownian excursion partition conditioned on Li = A. Then {Kn{X),n = 
1,2,.. .) is a Markov chain with the following inhomogeneous transition probabilities: 
for 1 < k < n 

P(ir„+i(A) = k I Kn{X) = k) = (2n - k) ^'-^-'"i^l (113) 

"-A:+l-2nl^j 
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nKn+l{\) = k + l\Kn{\)=k)= (114) 

Moreover, the distribution of Kn{X) is given by the formula 

Proof. The Markov property of {Kn{X),n = 1,2,...) and the transition probabihties 
(|Tl3D~(|Tl^) follow easily from (|ril|)-(|TT2D. Then ( plsD follows by induction on n, 
using the forwards equations implied by the transition probabilities. □ 



Let Kn denote the number of blocks of where (n„) is the unconditioned 
Brownian excursion partition. Then, from the discussion around (pO[), 



iKniX),n> 1) = {Kn,n> 1 1 limK„/\/2^ = A). 



:il6) 



According to (|89|), ( |75| ) and (76), the sequence {Kn,n > 1) is an inhomogeneous 
Markov chain with transition probabilities 



n+l 



k + l\Kn 



k) 



2n-k 
2n 



(117) 



' 2n 



(118) 



which imply that the unconditional distribution of Kn is given by the formula 
Corollary 3] 

^fe+l-2n 



F{Kn = k) 



2n-k-l 
n — 1 



{l<k< n). 



(119) 



Due to ( |116| ), for each A > the inhomogeneous Markov chain (i^„(A),ri > 1) has 
the same co-transition probabilities as {Kn,n > 1). From ( |117| ), ( |118| ) and ( |lig| ), the 
co-transition probabilities of {Kn,n > 1) are 



F{Kn = k\K, 



n+l 



k) 



W{Kn = k-l\K, 



n+l 



k) 



2{n-k + l) 
2n-k + l 

k-l 



(120) 



:i2i) 



2n-k + l 

As a check, the fact that (iir„(A),n > 1) has the same co-transition probabilities 



can be read from ( |113| ), (114) and (115). It can be shown that the Markov chains 
{Kn{\),n > 1) for A G [0, oo], with definition by weak continuity for A = or oo, 
are the extreme points of the convex set of all laws of Markov chains with these co- 
transition probabilities. A generalization of this fact, to a £ (0, 1) instead of a = ^, 
and similar considerations for a = 0, yield the second sentence of Theorem 
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To illustrate the formulas above, according to (|9|) and (pS|), or (110) for n = 2, 
given Li = A, two independent uniform [0, 1] variables fall in the same excursion 
interval of the Brownian motion with probability 



pi (2 II A) = /u(l II A) = /i_2(A) = 1 - A/i_i(A) 



(122) 



and in different excursion intervals with probability A/i_i(A). According to ( |11[1| ) 
for n = 3, given Li = A, three independent uniform random points Ui,U2,Us with 
uniform distribution on [0, 1] fall in the same excursion interval of a Brownian motion 
or Brownian bridge with probability 

¥{K3{X) = 1) = pi (3 II A) = 3/i_4(A) = 1 + iA^ - (I X + ^X^)h_i{X) (123) 
while Ui and U2 fall in one excursion interval and U3 in another with probability 

I F{Ks{X) = 2) = pi (2, 1 II A) = A/i_3(A) = -^A^ + (iA + ^X'')h^,{X) (124) 
and the three points fall in three different excursion intervals with probability 

P(K3(A) = 3) = pi (1,1,1 II A) = X^h_2{X) = X^ - X^h_i{X). (125) 



As a check, the sum of expressions for P(i^3(A) = k) over k = 1,2,3 reduces to 1. 
Since 

FiKniX) = k)= V #(ni,---,nfc)pi(ni,---,nfc||A) (126) 
^ — ^ 2 

ni>--->nj; 

where the sum is over all decreasing sequences of positive integers (ni, • • • ,nk) with 
sum n, and #(ni, ■ ■ ■ ,nk) is the number of distinct partitions of Nn into k subsets of 
sizes (ni, • • • , n^), formula ( |115D amounts to 



ni>--->nfc 



i=l 



2n — k — 1\ T(n) 91, n„ 



which can be checked as follows. According to ( [7^ ) and (88), the unconditional EPPF 
of the Brownian excursion partition 11 is 



Pi oini, ■■■ ,nk) 



m 



2fc-ir(n) JLi^2J«»-i 



(128) 



so (127) can be deduced from (|128D , (|119|), and the unconditioned form of (126). 



30 



J. Pitman 



8.3 Some identitities 

As a consequence of ( ^2|) and (p9|), for all g > — ^ and 6 > —\ there is the identity 

V{e + l)T{q + \) 



126- 



h r(i)r(<, + e + i) ^ ' 

where the right side is the gth moment of the beta(^, ^ + ^) structural distribution 
of PD(i,^), and on the left side this moment is computed by conditioning on Li. 
As in (^), for each fixed q this formula provides a Mellin transform which uniquely 
determines || A) as a function of A. In view of (|129|) and (p^), the formula (|99| ) 
for ^{q II A) in terms of the Hermite function amounts to the identity 

As checks, since ho{x) = 1 and h-i{x) = <I>(a;)/(^(x), the case g = is obvious, and the 
case g = ^ is easily verified since then the left side of (|129|) equals (20+l)~^E(|i3ip^+^) 
by integration by parts. Formula (|130[ ) can then be verified for g = m/2 for all 
m = 0,l,2,..., using the recursion (|106|) . Formula ( |130D was just derived for g > — ^, 
but both sides are entire functions of so the identity holds for all g G C. Using the 
series formula ( |100|) and integrating term by term, the substitution r = + \ allows 
the identity ( |130[ ) to be rewritten in the symmetric form 



2)^y^2)^- T{q + r+l/2) ^^^^^ 



where the series is absolutely convergent for real q and r with q + r + ^ < — 1, 
and can otherwise be summed by Abel's method provided neither 2q nor 2r is a non- 
positive integer. This version of the identity is easily verified using standard identities 
involving Gauss's hypergeometric function and the gamma function. For —2q = n a 
positive integer, when /i„ is the nth Hermite polynomial 

K{x) = Kk^""-^^ with = {-ifr y-^ 

the identity (ll30| ) reduces easily to the following pair of identities of polynomials in 
9, which relate the rising and falling factorials [x]n := x{x + 1) ■ ■ ■ {x + n — 1) and 
{x)n := x{x — l) ■ ■ ■ (x — re+l), and which are easily verified directly: for m = 0, 1, 2 . . . 

m 

^2m,fc2 ^[^ + ^]m-fc = (^)m 

fc=0 



and 



^2m+l,fc2 '^[6 + l]m-k — — f )m- 

k=0 
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Thus the coefficients of the Hermite polynomials are related to some instances of 



generalized Stirling numbers |22, 
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